## Loading required package: nlme
## This is mgcv 1.8-31. For overview type 'help("mgcv-package")'.
## Loading required package: Matrix
##
## Attaching package: 'lme4'
## The following object is masked from 'package:nlme':
##
## lmList
## Package 'mclust' version 5.4.6
## Type 'citation("mclust")' for citing this R package in publications.
##
## Attaching package: 'mclust'
## The following object is masked from 'package:mgcv':
##
## mvn
## Loading required package: StanHeaders
## Loading required package: ggplot2
## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
##
## Attaching package: 'gtools'
## The following object is masked from 'package:mgcv':
##
## scat
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:nlme':
##
## collapse
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'mvtnorm'
## The following object is masked from 'package:mclust':
##
## dmvnorm
##
## Attaching package: 'LaplacesDemon'
## The following objects are masked from 'package:mvtnorm':
##
## dmvt, rmvt
## The following objects are masked from 'package:gtools':
##
## ddirichlet, logit, rdirichlet
## The following object is masked from 'package:mgcv':
##
## rmvn
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] LaplacesDemon_16.1.6 mvtnorm_1.1-1 reshape2_1.4.4
## [4] dplyr_1.0.0 gtools_3.8.2 rstan_2.21.2
## [7] ggplot2_3.3.2 StanHeaders_2.21.0-5 mclust_5.4.6
## [10] lme4_1.1-23 Matrix_1.2-18 mgcv_1.8-31
## [13] nlme_3.1-148 RColorBrewer_1.1-2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.5.1 lattice_0.20-41 prettyunits_1.1.1 ps_1.3.3
## [5] assertthat_0.2.1 digest_0.6.25 V8_3.2.0 R6_2.4.1
## [9] plyr_1.8.6 stats4_4.0.2 evaluate_0.14 pillar_1.4.6
## [13] rlang_0.4.7 curl_4.3 minqa_1.2.4 callr_3.4.3
## [17] nloptr_1.2.2.2 rmarkdown_2.3 splines_4.0.2 statmod_1.4.34
## [21] stringr_1.4.0 loo_2.3.1 munsell_0.5.0 compiler_4.0.2
## [25] xfun_0.15 pkgconfig_2.0.3 pkgbuild_1.1.0 htmltools_0.5.0
## [29] tidyselect_1.1.0 tibble_3.0.3 gridExtra_2.3 codetools_0.2-16
## [33] matrixStats_0.56.0 fansi_0.4.1 crayon_1.3.4 withr_2.2.0
## [37] MASS_7.3-51.6 grid_4.0.2 jsonlite_1.7.0 gtable_0.3.0
## [41] lifecycle_0.2.0 magrittr_1.5 scales_1.1.1 RcppParallel_5.0.2
## [45] cli_2.0.2 stringi_1.4.6 ellipsis_0.3.1 generics_0.0.2
## [49] vctrs_0.3.2 boot_1.3-25 tools_4.0.2 glue_1.4.1
## [53] purrr_0.3.4 processx_3.4.3 parallel_4.0.2 yaml_2.2.1
## [57] inline_0.3.15 colorspace_1.4-1 knitr_1.29
Load dataset
Check outliers
## The dataset contains 2649 patients with measured PfHRP2 and measured platelet counts from 4 studies
## Patients per study:
##
## Bangladesh FEAST (Uganda) Kampala (Uganda) Kilifi (Kenya)
## 172 567 492 1418
Some data cleaning

##
## FALSE TRUE
## 6 2643
##
## Bangladesh FEAST (Uganda) Kilifi (Kenya)
## 1 8 18

## A total of 27 samples have zero PfHRP2 but more than 1000 parasites per uL
## After excluding the HRP2 outliers, the dataset contains 2622 patients with measured PfHRP2 and measured platelet counts
Overview of patient characteristics
Results for Table 1 in the paper
##
## Bangladesh FEAST (Uganda) Kampala (Uganda) Kilifi (Kenya)
## 171 559 492 1400
##
## Bangladesh FEAST (Uganda) Kampala (Uganda) Kilifi (Kenya)
## 0 0 227 0 0
## 1 171 332 492 1400
## study age.lower age.median age.upper
## 1 Bangladesh 23.5 30.0 45.0
## 2 FEAST (Uganda) 1.2 2.0 3.3
## 3 Kampala (Uganda) 2.2 3.3 4.6
## 4 Kilifi (Kenya) 1.4 2.4 3.7
## study hrp2
## 1 Bangladesh 171
## 2 FEAST (Uganda) 559
## 3 Kampala (Uganda) 492
## 4 Kilifi (Kenya) 1400
## study outcome
## 1 Bangladesh 26.9
## 2 FEAST (Uganda) 11.4
## 3 Kampala (Uganda) 6.7
## 4 Kilifi (Kenya) 11.1
## study platelet.25% platelet.50% platelet.75%
## 1 Bangladesh 27.0 50.0 139.0
## 2 FEAST (Uganda) 74.5 165.0 326.0
## 3 Kampala (Uganda) 49.0 96.0 169.5
## 4 Kilifi (Kenya) 64.0 111.0 215.0
## study hrp2.25% hrp2.50% hrp2.75%
## 1 Bangladesh 1082.9050 2667.0400 6127.5550
## 2 FEAST (Uganda) 0.0000 174.7100 1952.6900
## 3 Kampala (Uganda) 588.0000 1838.4000 4097.4000
## 4 Kilifi (Kenya) 418.7393 2206.7408 5071.5299
## study wbc.25% wbc.50% wbc.75%
## 1 Bangladesh 6.900 9.000 11.000
## 2 FEAST (Uganda) 8.400 11.950 18.675
## 3 Kampala (Uganda) 7.500 10.400 15.300
## 4 Kilifi (Kenya) 8.900 12.550 19.000
## study para.25% para.50% para.75%
## 1 Bangladesh 23550 148874 348540
## 2 FEAST (Uganda) 3640 37600 153680
## 3 Kampala (Uganda) 10635 42530 198540
## 4 Kilifi (Kenya) 6099 69824 316350
##
## 1 2 3
## Bangladesh 0 0 0
## FEAST (Uganda) 466 46 21
## Kampala (Uganda) 463 4 23
## Kilifi (Kenya) 1348 41 7
## as.numeric(dat_all$platelet <= 150) hrp2.25% hrp2.50% hrp2.75%
## 1 0 24 269 1043
## 2 1 1261 3031 6035
Correlation between the platelet count and the PfHRP2 concentration
## African sites: correlation:
##
## Pearson's product-moment correlation
##
## data: log10(dat_all$platelet[ind_Africa]) and log10(dat_all$hrp2 + 1)[ind_Africa]
## t = -31.892, df = 2449, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5690929 -0.5131183
## sample estimates:
## cor
## -0.5417058
## [1] -186.2477
## cor
## -0.54
## Bangladesh: correlation:
##
## Pearson's product-moment correlation
##
## data: log10(dat_all$platelet[!ind_Africa]) and log10(dat_all$hrp2 + 1)[!ind_Africa]
## t = -4.9404, df = 169, p-value = 1.863e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4797402 -0.2167256
## sample estimates:
## cor
## -0.3552439
## [1] -5.72988
## cor
## -0.36
Summary plot of the biomarker data

Basic data exploration: clustering with mclust
mclust is a generic Bayesian clustering algorithm (fits multivariate normals to the data)
We merge all the data into one and fit mclust
## [1] 2477
##
## Bangladesh FEAST (Uganda) Kampala (Uganda) Kilifi (Kenya)
## 170 425 484 1398
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VVE (ellipsoidal, equal orientation) model with 3 components:
##
## log-likelihood n df BIC ICL
## -6826.373 2477 23 -13832.49 -14386.62
##
## Clustering table:
## 1 2 3
## 910 148 1419
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VVE (ellipsoidal, equal orientation) model with 4 components:
##
## log-likelihood n df BIC ICL
## -3702.525 2477 20 -7561.347 -8380.208
##
## Clustering table:
## 1 2 3 4
## 139 770 80 1488

## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VEV (ellipsoidal, equal shape) model with 4 components:
##
## log-likelihood n df BIC ICL
## -5841.928 2477 20 -11840.15 -12924.89
##
## Clustering table:
## 1 2 3 4
## 898 859 574 146

## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VVE (ellipsoidal, equal orientation) model with 3 components:
##
## log-likelihood n df BIC ICL
## -4172.268 2477 15 -8461.758 -9076.87
##
## Clustering table:
## 1 2 3
## 962 196 1319

##
## 1 2 3 4
## Bangladesh 0 28 3 139
## FEAST (Uganda) 136 114 36 139
## Kampala (Uganda) 0 154 9 321
## Kilifi (Kenya) 3 474 32 889
In conclusion, the distribution of the parasite count is less easily decomposed than that of the platelet count or HRP2 concentration. The HRP2 and platelet count is the only one where the estimated break is not othorgonal to either variable.
Fitting a mixture model to platelets and HRP2
Two component mixture - not including FEAST
Main model
Analysis not including the FEAST study - only severe malaria studies
We convert the platelet counts and HRP2 measurements to log10 scale. The Platelet counts then get multiplied by minus 1: this is so that increasing values correspond to more likely severe malaria. This is because the underlying stan model uses the ordered vector type to avoid label switching problems.
## site_index_SMstudies
## 1 2 3
## 171 492 1400
## [1] 2063
## Priors on mean biomarker values:
## [,1] [,2]
## [1,] -2.39794 -1.875061
## [2,] 2.30103 3.477121
## Priors on standard deviations around biomarker values:
## [,1] [,2]
## [1,] 0.1 0.1
## [2,] 0.1 0.1
## Priors on prevalence of SM:
## [,1] [,2]
## [1,] 19 1
## [2,] 14 6
## [3,] 14 6
compile the stan model
Run the stan models - outputs are stored in Rout
We check convergence with the traceplots



